Word-class embeddings for multiclass text classification

نویسندگان

چکیده

Pre-trained word embeddings encode general semantics and lexical regularities of natural language, have proven useful across many NLP tasks, including sense disambiguation, machine translation, sentiment analysis, to name a few. In supervised tasks such as multiclass text classification (the focus this article) it seems appealing enhance representations with ad-hoc that task-specific information. We propose (supervised) word-class (WCEs), show that, when concatenated (unsupervised) pre-trained embeddings, they substantially facilitate the training deep-learning models in by topic. empirical evidence WCEs yield consistent improvement accuracy, using six popular neural architectures widely used publicly available datasets for classification. One further advantage method is conceptually simple straightforward implement. Our code implements at https://github.com/AlexMoreo/word-class-embeddings .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Actionable and Political Text Classification using Word Embeddings and LSTM

In this work, we apply word embeddings and neural networks with Long Short-Term Memory (LSTM) to text classification problems, where the classification criteria are decided by the context of the application. We examine two applications in particular. The first is that of Actionability, where we build models to classify social media messages from customers of service providers as Actionable or N...

متن کامل

Using Two-Class Classifiers for Multiclass Classification

The generalization from two-class classification to multiclass classification is not straightforward for discriminants which are not based on density estimation. Simple combining methods use voting, but this has the drawback of inconsequent labelings and ties. More advanced methods map the discriminant outputs to approximate posterior probability estimates and combine these, while other methods...

متن کامل

Word Embeddings for Multi-label Document Classification

In this paper, we analyze and evaluate word embeddings for representation of longer texts in the multi-label document classification scenario. The embeddings are used in three convolutional neural network topologies. The experiments are realized on the Czech ČTK and English Reuters-21578 standard corpora. We compare the results of word2vec static and trainable embeddings with randomly initializ...

متن کامل

Bag-of-Embeddings for Text Classification

Words are central to text classification. It has been shown that simple Naive Bayes models with word and bigram features can give highly competitive accuracies when compared to more sophisticated models with part-of-speech, syntax and semantic features. Embeddings offer distributional features about words. We study a conceptually simple classification model by exploiting multiprototype word emb...

متن کامل

Text Segmentation based on Semantic Word Embeddings

We explore the use of semantic word embeddings [14, 16, 12] in text segmentation algorithms, including the C99 segmentation algorithm [3, 4] and new algorithms inspired by the distributed word vector representation. By developing a general framework for discussing a class of segmentation objectives, we study the effectiveness of greedy versus exact optimization approaches and suggest a new iter...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Data Mining and Knowledge Discovery

سال: 2021

ISSN: ['1573-756X', '1384-5810']

DOI: https://doi.org/10.1007/s10618-020-00735-3